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ABSTRACT 

National Assessment is a data gathering project 
designed to prcvxde information, in ten subject areas, about 
knowledge, skills, understandings and attitudes of young people in 
this country, and to assess changes in these variables over time. The 
data is collected and reported at the item level. Each exercise was 
developed with emphasis on content validity, and is geared to sample 
a specific objective within a subject area. A striking feature of the 
first National Assessment report is that there are no scores or norms 
with which tc compare results. Instead the individual exercises with 
the percent choosing or producing each response (p-values), both 
correct and incorrect, are given. This technique c^lovs the reader to 
evaluate results and draw inferences tor himself rather than just 
review an average or su&tnary. Also, by looking at the P-values of 
wrong responses, considerable light may be shed on commonly hell 
misconceptions. Generalizations discussed are based on exercises from 
the subject areas of Science and Citizenship, with only partial 
National Assessment results available for the latter, and are drawn 
by looking at the exercises as a total set of exercises, not as a 
total score. They are not to be construed as representing the 
National Assessment's viewpoint, as the selection of these 
generalizations, rather than others that might be drawn from the 
data, is a personal one* (Author/CK) 
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ABOUT THIS REPORT 

Few educational projects have endured the 
controversy and demonstrated the promise of 
National Assessment. An early fear was that 
National Assessment would be used by "central 
authorities" to rank the quality of local schools. 
The project guards against such misuse by protect- 
ing the anonymity of individual schools and 
systems. But this educational census doe. nake it 
possible for local educators to make local studies 
of achievement. This is a major strength of the 
undertaking. 

Local and national judgment of the effective- 
ness of American education is greatly facilitated by 
National Assessment's concentration upon indi- 
vidual items and their relationship to school objec- 
tives. This means of reporting data puts the 
emphasis on educational outcome in terms that 
have some absolute meaning. And it makes it 
possible to take into account focal conditions and 
local goals. 

In this report Dr. Womer cites some prelimi- 
nary results of National Assessment to illustrate 
the important contribution it can make to educa- 
tion. As staff director he is a well-informed spokes- 
man. Dr. Womer has had a distinguished career as 
school teacher, editor, and professor at the Univer- 
sity of Michigan. He is well-known for his writings 
and professional services-especially as past 
President of NOME. 
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FRANK B. WOMER 

More young adults between the ages of 26 
and 35 (9 out of 10) are aware of the fact that the 
President does not have the right to do anything 
affecting the United States that he wants to do 
than are 17-year-olds (8 out of 10), 13-year-olds (7 
out of 10), or 9-year-olds (5 out of 10). The 
question on which these results are based and the 
results themselves are presented in Examp'e l in 
the box on page 3. ( t comes from a Citizenship 
exercise for National Assessment. While most of 
the young adults in the National Assessment sam- 
ple could state an acceptable reason for their 
answer (8 out of 10), the younger age groups did 
not do as well (only 2 out of 10 of the 9-year- 
olds). These results suggest that for this specific bit 
of information there is continuing growth through 
the school years and even into young adulthood. 

National Assessment* is a data gathering proj- 
ect designed to provide specific information about 
knowledges, skills, understandings, and attitudes of 
young people in this country. The data are col- 
lected and reported at the item level, with each 
item geared to sample a specific objective within a 
subject area. This information provided by 
National Assessment has not been available pre- 
viously. Example I is one exercise released in the 
first National Assessment reports which covered 
Science and part cf Citizenship results. Many addi- 
tional reports will be forthcoming over the years — 
in these two subject areas as well as ir. eight others 
now s r hedufed. 

Perhaps the most striking feature of National 
Assessment’s first reports is that there are no scores 
or norms - just individual exercises (questions, 
items) along with the percent choosing or produc- 
ing each response (pvalues), both correct and 
incorrect, for each exercise. Those who developed 
the plan for National Assessment fell that the best 
way to describe what young people know is to 
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present the questions or tasks that they were asked 
along with information about how well they per- 
formed. This directs attention at actual samples of 
behavior rather than at some summation of behav- 
iors. It allows the reader of the reports to make his 
own evaluation of each exercise. He can accept the 
results as meaningful information useful in teach- 
ing and/or curriculum evaluation and/or policy 
making and/or allocation of educational funds, etc. 
He can reject a question as meaningless or in- 
appropriate if his judgment leads him to that 
conclusion. The point is that the reader of a report 
has all of the results before him rather than an 
average or a summary or a conclusion. 

With this type of reporting in mind, National 
Assessment developed its exercises with an eye to 
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content validity, as judged by subject matter SDe- 
cialists, other educators, and laymen. The exercises 
were not item analyzed (there is no total score) nor 
were they related to future performance (there arc 
no criterion measures). The purpose of National 
Assessment exercises in toto is to describe, by 
example, what most young people know and can 
do, what about half can do and what very few can 
do. The purpose of a single exercise is to stand as 
one example of a meaningful knowledge or skill or 
attitude that relates to a specific objective in a 
given subject area. 

This type of report is "dangerous" because it 
exposes eacn question or task to critical examina- 
tion by a reader. Each of the nearly 200 Science 
exercises released in 1970 is subject to scrutiny for 
individual imperfections. And even after being 
reviewed by between 12 and 20 persons, some 
exercises are still less than perfect. But this type of 
report also is "courageous" because it allows cC- 
ceptance or rejection of each exercise individually. 
A reader is not presented with generalizations 
based on materials that he has never seen. 

Another striking feature of Nat'ona! Assess- 
ment's initial reports is that there are no standards 
or norms against which one can compare the 
results. Consider Example II. in that Science exer- 
cise two-thirds of the 17s and half of the adults 
knew the correct answer (indicated by the 
blackened circle). But is two-thirds good or bad? Is 
half good or bad? There is no statistical reference 
point. Several science educators who reviewed 
these results nave indicated dissatisfaction that 
more respondents did not know about the inter- 
relationship of animal and plan, life in an eco- 
system. But such judgment relates to an expecta- 
tion, an internal personal standard. And it is 
exactly this type of standard, personal judgment, 
that must be used to draw certain types of conclu- 
sions from the results — particularly the type of 
conclusion that attempts to judge whether or not 
young people are learning what they "should" 
learn. Should, in this context, must be a personal, 
thoughtful judgment. 

One of the initial reviewers of Example II was 
not disappointed because of any feeling that more 
17s and more adults should know about eco- 
systems. But she was disappointed by the results of 
this exercise because, in her judgment, it is more of 
a reasoning exercise than a knowledge exercise. 
And she felt that more 1 7s and more adults should 
have been able to determine the answer by logical 
deduction from the information given. This illus- 
trates the fact that different readers wi!! have 
different insights into National Assessment results. 

Eventually National Assessment will generate 
its own standards, although not in the same sense 
as those established for the usual standardized test. 
Only half or fewer of the exercises administered by 
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EXAMPLE I 



A. Goes the President have the right to do anything 
affecting the United States that he wants to do? (Yes, 
No, i don't know) 

B. (If yes) Why? (Part B was not scored; it was asked to 
insure that respondents understood Part A and to give 
them a chance to explain their position.) 

C. (If no) Why not? 

(If answer to C is vague) Who or what would stop him 
from doing what he wants? 

Acceptable reasons to C (examples): People could stop him; 
elected officials could stop him; checks and balances system 
of government; laws stop him; country would be a dictator- 
ship; not the democratic way. 

Unacceptable reasons to C (examples): Police or Vice 
President would stop him; he wouldn't be doing his job; he 
might do something that could hurt the country; he would 
be doing what is right; people vote for him not to; he can't 
do it; everybody, even the President, has some limitations; 
he just advises us; he can't do everything since he is only 
one person. 



Results 




Age 






9 


13 17 


Adult 


Stated that the President does 
not have the right to do any- 
thing affecting the United States 
that he wants (No to A) 


49% 


73% 78% 


89% 


Staled that the President does 


18 


53 68 


80 


not have the right and gave an 
acceptable reason (acceptable 
reason to C as well as No to A) 









National Assessment in a given year are released 
that year. The others are retained to be used again 
three or six years hence. Ai that time it will be 
possible to compare the results from a second or 
third assessment with those obtained previously. 
Then one can see whether change (progress?) is 
taking place in the knowledges and skills of young 
people over time. This is the ultimate goal of 
National Assessment — to measure changes in 
knowledges, skills, and attitudes over time 



Since the first results of National Assessment 
nchmark data, they provide neither instan- 




taneous "indictments'' of American education nor 
instantaneous "whitewashing." This fact has been 
of considerable disappointment to persons who 
looked upon th * project as one which will provide 
"answers" to all sorts of educational questions. An 
information-gathering project is not designed to 
provide such answers. But it can and should pro 
vide decision-makers with information useful in 
decision-making. Hopefully National Assessment 
will do just that. 

Even though the ultimate goal of National 
Assessment is to assess change, the first results do 
point to some generalizations of considerable im- 
port, as well as illustrating specific knowledges and 
skills and attitudes that young people have and 
have not attained. The generalizations discussed 
here are based on total national results for Science 
and on partial national results for Citizenship. 
Later reports will include comparisons, item by 
item, for four geographic regions, size and type of 
community, sex, color (black versus non-black), 
and an educational index of the home. 









EXAMPLE 11 


In a particular meadow there are many rebbtts that eat the 
grass. There arc jlso many hawks that eat the rabbits. Last 
year a disease broke out among the rabbits and a great 
number of them died. Which of the following probably 
then occurred? 








Results 


Age 1 7 

4% 


Adult 

2% 


n 


The grass died and the hawk popu- 
lation decreased 


1 


1 


o 


The grass died and the hawk popu- 
lation increased. 


68 


52 


• 


The grass grew taller and the hawk 
population decreased. 


4 


4 




The grass grew taller and the hawk 
population increased. 


20 


30 


o 


Neither the grass nor the hawks 
were affected by the death of the 
rabbits. 


2 


to 


0 


1 don't know. 


1 


1 




No i esponse 


100% 


100% 







3s 



3 



The following generalizations are based on 
looking at the exercises as a total set of exercises, 
but not as a total score. An attempt is made to 
identify what the data say versus what the author 
says. The author's views are not to be construed as 
interpretations representing a National Assessment 
viewpoint, and the selection of these generaliza- 
tions rather than others that might be drawn from 
the data is a personal one. 



The evidence for this statement is based on 
"overlap" exercises, those administered to more 
than one age level. Consistently on the overlap 
exercises 1 7s did better than 13s and 1 3s did better 
than 9s. There were 15 overlaps between 9s and 
13s for Science and 17 for Citizenship. All of the 
Science overlaps and 13 of the 17 Citizenship 
overlaps favor the 13s over 9s. There were 23 
overlaps between 13s and 17s for Science and 73 
for Citizenship. All of the Science overlaps and 47 
of the 73 Citizenship overlaps favor the 17s over 
13s. Note that the generalization does not take a 
position as to whether the schools, or other social 
organizations, or the family, or any other causal 
factor or combination of factors are responsible. 

A statement that learning is taking place as 
young people proceed through the school years is 
rot exactly revolutionary. Observation and com- 
mon sense have indicated as much. But National 
Assessment documents this generalization in terms 
of specific knowledges r.nd skills. For example, 
when students were asked to read a chart depicting 
seven weights, 80 percent of the 9s were able to 
identify the greatest weight, whereas 92 percent of 
the 13s were able to do so. (There was a single 
number that was greatest - 64 pounds.) From the 
same chart 54 percent of the 9s could identify the 
smallest weight (distinguish between 2 pounds and 
2 ounces) whereas 81 percent of 13s could do it. 
When asked about what happens when scientists 
carefully measure any quantity many times, 69 
percent of the 13s correctly selected the alternative 
"most of the measurements will be close but not 
exactly the same" while 72 percent of the 17s did 
so. When asked to write the names of the presi- 
dent, vice-president, secretary of state, secretary of 
defense and five other federal office holders, 17s 
did better than 13s for each of the nine positions 
(ranging from 2 percent of the 13s who could write 
McCormack as the speaker of the house to 98 
percent of the 17s who could write Nixon as the 
president). 

Perhaps the greatest utility of this specific 
type of Information from overlap exercises will be 
as an aid to understand when growth is taking 
place in specific skill and knowledge areas. The 
difference of 27 percentage points between 54 
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pe. cent and 81 percent for the one exercise quoted 
above is substantial. The difference of three per- 
centage points between 69 percent and 72 percent 
is not. If one wanted to generalize from this to a 
statement about the types of scientific knowledges 
on which 1 3s do better than 9s or 1 7s better than 
13s, a much larger number of exercises would be 
needed. 

It may be noted that all of the reversals in the 
exercises initially reported are in Citizenship. For 
four of tiie overlaps 9s did belter than 13s and for 
26 overlaps 13s did better than 17s. 0: e also may 
note that many of the Citizenship exercises are 
attitudinal whereas most of the Science exercises 
require knowledges and skills. But again, until all 
of the Citizenship exercises are analyzed it would 
be hasty to come to any conclusion on this matter. 



There were 58 Science and 57 Citizenship 
exercises administered both to 17s and adults. Of 
these, 38 Science and 10 Citizenship overlaps 
yielded higher p-values for the correct answer for 
17s, while 20 Science and 47 Citizenship overlaps 
yielded higher p-values for adults. Because the 1 7s 
did better on more Science exercises than the 
adults one might be tempted to assume general 
superiority of the 17s. However, that would be a 
hasty and unsupported generalization. Careful ex- 
amination of the overlap ex rcises has led several 
reviewers to the conclusion that the exercises for 
which 17s did better tend to be of a different type 
than the ones for which adults did better. The 
Science report states: * 



Examination of the released exercises sug- 
gests chat Adults do as well or better than 
17s when asked questions whrr;h they may 
know from personal experience, wh?reas 17s 
do better on exercises which require formal 
education. Thus, Adults do better than 17s 
on two exercises which call for knowledge on 
human reproduction and on an exercise 
about fuses. On the other hand, more 17s 
than Adults successful^ chose the response 
'electrons' ^hen asked, 'An electric current 
in a copper wire involves mainly the move- 
ment of . . and given five alternatives, 
i milarJy, 1 7s were more su \; ?s$ful on. 'Two 
light waves are traveling in a vacuum. The 
wave with the higher frequency will have the 
(shorter wave length).’ While 32 percent of 
the 17s chose the correct response, 20 per- 
cent of the Adults chose it." 



* Science: Nation Results, 1970. National As^ssmenl of Educa- 
tional Progress Ann Arbor Michigan. 1970. 
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"The seven released exercises that were an- 
swered correctly by most Adults deal with 
non technical information that might be 
found in newspaper or magazine articles deal- 
ing with scientific topics or in television 
programs on science. Five of the seven might 
be considered to have to do with biology or 
medicine. 

"The exercises that few Adults answered 
correctly are quite technical in nature, invol- 
ving knowledge that is likely to be learned 
only in school and is reinforced by experi- 
ence by few young adults (e.g., the periodic 
table).'' 

From these remarks it must be concluded that 
a different balance of Science exercises coula have 
yielded a different balance of results for 17s and 
adults. Although adults answered correctly more 
often than 17s in most of the Citizenship overlaps, 
the exercises in the first partial results are heavily 
attitudinal or deal with knowledge of government. 
When all of the Citizenship results are available this 
trend may or may not be evident. 

The generalization that young adults some- 
times show greater achievement than 17s in 
Science and Citizenship may, like the first gener- 
alization discussed, seem to many readers to be 
another bit of "common sense", but documenta- 
tion of common sense can have its own utility. 
Documentation can focus attention upon an area 
of learning in a way that common sense may not. 
This simple generalization helps to remind us that 
much learning takes place outside of schools, that 
"textbook" learning may have limited utility {if 
textbook implies rote memory primarily), and that 
if the ultimate goal of education is an enlightened 
citizenry one needs to examine carefully what 
knowledges and skills adults truly need to acquire 
and retain. 



In producing the exercises to be used in 
National Assessment, item writers were given 
several criteria to follow. Among them were; 1) to 
produce exercises with high content validity which 
relate to specific objectives, and 2) to produce 
exercises that were very easy (one-third), very 
difficult (one-third), and in-between fone-third). 
Thus, Science item wiiters were asked to produce 
exercises that almost all young people could 
answer correctly and that were criterion-referenced 
and meaningful, to produce very difficult exercises 
that also were meaningful, and so on. The primary 
problem here was to sample each difficulty level 
without artificial manipulation. When pressed by 
the author to produce a greater number of easy 
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exercises, one writer responded by saying that he 
could do it easily by substituting ridiculous foils 
for logical foils in multiple-choice exercises. This 
response ignored the criterion of high content 
validity. 

If item writers had been able to do exactly 
wha 1 had been asked of them, the results would 
have produced trimodal distributions of p-va!ues. 
What were in fact produced were essentially rec- 
ts^ngular distributions, definitely not normal dis 
tributions of p-values {see pages 8-11 of Science: 
National Results , 1970). For ages 9 and 13 there 
was a deficiency of very difficult exercises ( p< .25 ) 
whereas 17 and adult results were better spread 
over the total range of p-values. This may seem 
surprising when one considers that most of the 
Science exercises were multiple-choice. One might 
have expected truncated distributions with very 
few exercises with p-values below the chance pro- 
bability levels, but this did not occur. The question 
of chance was discussed at great length prior to the 
first assessment. In an attempt to minimize guess- 
ing (and based on specific research results) an "I 
don't know" choice was added to almost every 
multiple-choice exercise. It was elected as an 
option by quite a few respondents, particularly for 
the very difficult questions. 

But what is the import of the generalization 
that there are meaningful knowledges and skills at 
all difficulty levels? It suggests that all young 
people have acquired meaningful knowledges and 
skills that relate directly to objectives of instruc- 
tion in Science and in Citizenship. Most people 
probably would have paid lip service to this state 
ment prior to any results of National Assessment, 
but unfortunately too many of us (including we 
teachers, who should know better) have acted as if 
we felt that some youngsters were completely 
devoid of useful skills or knowledges. As National 
Assessment results are accumulated ove; the years, 
it should be possible to develop a picture of what 
knowledges and skills all 9s, 13s, and 17s have 
attained. Whether we will be satisfied with i hat 
picture is another question, but at least we will 
know where students stand. This should be of 
considerable help in planning for group learning 
experiences, in avoiding knowledge already ac- 
quired and in building knowledge not yet acquired. 

The results in Example III suggest that socety 
has done a fairly good job in getting young people 
to indicate lack of bias toward people of other 
races, in a paper and pencil situation. The obvious 
response to this is that what people say they would 
be willing to do and what they really do may not 
be the same. Nevertheless, if young people did not 
indicate tolerance, there would be little chance for 
further progress in race relations. To the author the 
most disturbing aspect of Example III is t hat 
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EXAMPLE III 


1 * 






People feel differently toward people of other races. How 
willing would you be to have a person of a different race 
doing these things? 


Results 








IFor each situation below, the 
choices were: Willing to, prefer 
not to] 

A. Be your dentist or doctor? 


% 

13 

81% 


willing to 
Age 

17 

74% 


Adult 

75% 


B. Live next door to you? 


83 


77 


67 


C. Represent you in some 
elected office? 


81 


82 


8? 


D. Sit at a table next to yours 
f n a crowded restaurant? 


80 


90 


88 


E. Stay in the same hotel or 
motel as you? 


88 


92 


89 


Willing to for one or more of the 
above 








two or more . . . 


96 


97 


93 


three or more . . . 


94 


94 


90 


four or more . . . 


89 


88 


86 


all five . . . 


56 


56 


56 


•Not administered to the in-school sample in one targe western 
stata, one southeastern county and one south western city at the 
request of state or local authorities. 



almost half (44 percent) of the respondents at each 
age level did indicate an unwillingness to accept a 
person of another race on at least one of the five 
categories. 

Samples of information known to very few 
young people present a situation of a different 
sort. They must be related to some standard of 
whether we would (should) ever expect large num- 
bers of students to acquirt? that skill or knowledge. 
For example, one might not be at all concerned 
that only 6 percent of 17s can identify tin and 
sulfur as the two elements which have been oxi- 
dized, when shown a specific chemical equation. 
On the other hand one might be quite disturbed (as 
several reviev/ers were) that only 7 percent of 9s 
could correctly answer the Science Exercise in 
Example IV. In th ,s Exercise most 9s simply added 
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the two temperatures rather than taking the 
average. Whether this is because 9s don't under- 
stand the concept of an average, whether it is 
because word problems bother them, whether the 
word Fahrenheit was a problem for 9s, or whether 
there is some other reason, I suspect most of us 
would feel that this is the sort of skill that must be 
acquired by more than 7 percent of our young 
people, at some appropriate age. 

This generalization about the breadth and 
depth of knowledges that young people have will 
hearten some and frighten others. Such informa- 
tion can help to focus our attention whe it ought 
to be in education - on what young people are or 
are not learning and on what attitudes they are or 
are not developing. And this is accomplished with 
specific examples rather than summary statistics. 



When item writers prepared exercises for 
National Assessment, they were asxed to classify 
each one as 10 percent (very difficult), 50 percent 
(moderately difficult), or 90 percent (very easy). 
As various subject matter reviewers examined the 
exercises, they were asked to note instances where 
they disagreed with the writer's estimate of dif- 
ficulty. Some of the 10-60 90 dr '.ignations were 
changed; most were not. 



EXAMPLE IV 

A pint of water at a temperature of 50° Fahrenheit is 
mixed with a pint of water at 70° Fahrenheit. The 
temperature of the water just after mixing will be about 

Results 

Age^9 



4% 


V 


20°F. 


2 


0 


50" F. 


7 


• 


60°F. 


5 


o 


70°F. 


69 


o 


1 20° F. 


12 


c 


1 do . t know. 


0. 

99% 




No response 



When reporting the results for Science, three 
categories of correct responses were established: 
rather few (0-33 percent), good many (34-66 per- 
cent), and most (67-100 percent). The original 
writers' and reviewers' estimates have now been 
plotted against the actual results. Of 498 com- 
parisons made for Science, 339 (68 percent) were 
the same whereas 159 (32 percent) were different. 
Thus the item writers and reviewers judged dif- 
ficulty correctly two-thirds of the time and judged 
^incorrectly one-third of the time. In the author's 
opinion, this is not outstanding success. This is not 
to suggest that the particular writers and reviewers 
for National Assessment were poor judges, since it 
is not known whether other writers could have 
done better or whether writers in other subject 
areas could have done better. 

Across the four age groups the percentages of 
agreement were 70, 65, 70, and 67 for ages 9, 13, 
17 and adult. Thus, judgments were quite similar 
across ages. The writers were correct 80 percent of 
the time for the easy exercises (the 90s) whereas 
they were correct only 60 percent of the time for 
the others (50s and 1 0s). 



These results raise the question of whether it 
is possible for adults to do a really good job of 
estimating item p-values of students. One early 
critic of National Assessment suggested to the 
author that since we were asking writers to prepare 
exercises with difficulty levels of 10, 50, and 90, 
we really did not need to collect data. The results, 
however, indicate that in the final analysis it is 
9-year-ofds who must tell us what 9-year-o!ds 
know, that it is 1 3-year-olds who must tell us what 
1 3-^ ear-olds know, and so on. 



Of the 49 overlap exercises between ages 17 
and adult which included "I don't know", adults 
chose the "I don't know" more often than 17s for 
41 exercises while 17s used it more often for the 
other eight. One might assume, since 17s actually 
did better on more exercises than adults, that the 
adults were just realistic about their lack of knowl- 
edge. The results do not support this assumption, 
however, since 17s were also wrong more often 
than adults (determined by adding p-values for 
incorrect foils). Thus, it is safer to conculde for the 
49 overlap Science exercises that adults are less 
willing to guess than 1 7s. Why this is so is another 
question. The author's personal hypothesis is that 
school-age youngsters are so geared to guessing on 
multiple choice exercises (and rightfully so) that 
^ ™any of them never seriously considered the "I 




don't know" alternative. Adults, who are assessed 
In their own homes, probably are less concerned 
about a "score" that they might be achieving than 
about giving their best response or straightforwardly 
admitting a lack of knowledge. Whether it is good 
or bad that 17s seem to be prone to more guessing 
than adults is a debatable question. 



The fact that trained Item writers and re- 
viewers were "surprised" with respect to estimated 
difficulty fully one-third of the time is strong 
support for this statement. Noneducators probably 
wiil be surprised even more. Evidence for this 
assumption comes from the initial newspaper arti- 
cles written about National Assessment results. The 
education writer for The Washington Post was 
unhappily surprised that, while 70 percent, 91 
percent, and 92 percent of the 13s, 1 7s, and adults 
respectively knew that the Senate was the second 
of the two houses of Congress, 17 percent, five 
percent, and four percent respective rw f hought that 
the Supreme Court was the answer. From a 
psychometric viewpoint, any p-value above 90 
percent might seem to be good, but to a Washing- 
ton reporter anything jess than 100 percent on 
such an item is unthinkable. The same reporter was 
disturbed that only half of the 17s and adults 
could solve this problem correctly: "A motor boat 
can travel five miles an hour on a still lake. If this 
boat travels downstream on a river that is flowing 
five miles per hour, how long will it take the boat 
to reach a bridge that is 10 miles downstream?" 

One reviewer of the National Assessment 
Science results finished his comments by listing 
eight pleasant surprises and 10 unpleasant sur- 
prises. He was pleased that 89 percent of 1 7s knew 
that living dinosaurs have never been seen by man 
(' The Flintstones not withstanding") and was dis- 
pleased that only 33 percent of 1 7s and 25 percent 
of adults knew that doubling the linear dimensions 
of a cube increases its volume eightfold. 

Readers of the National Assessment reports 
may want to play the same game — estimating 
what they think students know and can do In 
specific exercises before looking at the p-values. 
Another potentially fruitful approach for teachers 
and curriculum specialists is to look ct the p-values 
of wrong responses as well as of the correct 
responses. Such analyses have the potential of 
r’ jedding considerable light on specific miscon- 
ceptions that are commonly held. For teachers, 
broad generalizations that may be abstracted from 
National Assessment results may be of much less 
interest than specific, item-by-item analyses. Such 
analyses probably are best done by subject matter 
specialists rather than measurement personnel, 
though certainly most NOME members could have 
considerable positive Input to such analyses. 
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In order to illustrate future reports, 10 
Science exercises for age 17 were broken down 
along six dimensions (main effects). Sometime in 
the first half of 1971 all released exercises will be 
reported this way. Ten exercises alone are too few 
to use to draw an\ conclusions. Vet we may note 
that for seven of these 10 exercises, 17s whose 
parents had some high school work but did not 
graduate were significantly (more th?n 1.5 stand- 
ard errors) below the national p-values. In contrast, 
for 8 of the iO exercises the 17s whose parents had 
taken work beyond high school graduation scored 
significantly above the national p-values. For 17s 
whose parents graduated from high school but had 
not taken additional work, only one of the 10 
e/.ercises showed any significant variation from the 
national values. It will be interesting to find out 
whether this trend continues across all of the 
Science exercises. 

In some instances it may prove advantageous 
to look separately at those exercises that show 
great differences between groups and those that do 
not. Such analyses for blacks versus non-blacks 
may prove useful if they enable one to take a look 
at where achievement differences seem substantial 
versus where they are minimal. For example, an 
exercise requiring actual manipulation of weights 



on a balance beam was answered correctly by 50 
percent of the hlacks and /8 percent of the whites. 
In contrast the following exercise was answered 
correctly by 55 percent of the blacks and 54 
percent of the non-blacks: "A five-pound rock is 
dropped from a cliff 500 feet high. The longer the 
rock falls, the greater is its (speed)''. Analyses and 
evaluation of results such as these could keep 
educators, curriculum specialists, and sociologists 
busy for some time. 

The purpose of this article has been to prest v 
some of the highlights of the first Nations Assess- 
ment results. After five years of preparation and a 
year of data collection, it now is possible to see the 
complete national results for Science and part of 
the national results for Citizenship (the rest tire due 
in print this fall or winter). A major step forward 
in information gathering in education has been 
taken. If this new information is to prove useful 
and helpful to laymen and educators, it must be 
widely disseminated, discussed, analyzed, eval- 
uated, cursed, and praised. NCME members are in a 
unique position to influence the ultimate success 
or failure of National Assessment by individual 
dissemination In schools (workshops, institutes, 
curriculum libraries, etc.) and colleges (courses, 
libraries, workshops, etc.). Inquiries regarding 
National Assessment reports should be addressed 
to the Superintendent cf Documents, Government 
Printing Office, Washington, D.C, Additional 
information about the National Assessment project 
can be secured from 2222 Fuller Road, 201A 
Huron Towers, Ann Arbor, Michigan, 48105. 
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